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Abstract — We discuss error floor asympotics and present a method for 
improving the performance of low-density parity check (LDPC) codes 
in the high SNR (error floor) region. The method is based on Tanner 
graph covers that do not have trapping sets from the original code. The 
advantages of the method are that it is universal, as it can be applied to 
any LDPC code/channel/decoding algorithm and it improves performance 
at the expense of increasing the code length, without losing the code 
regularity, without changing the decoding algorithm, and, under certain 
conditions, without lowering the code rate. The proposed method can 
be modified to construct convolutional LDPC codes also. The method is 
illustrated by modifying Tanner, MacKay and Margulis codes to improve 
performance on the binary symmetric channel (BSC) under the Gallager 
B decoding algorithm. Decoding results on AWGN channel are also 
presented to illustrate that optimizing codes for one channel/decoding 
algorithm can lead to performance improvement on other channels. 

Index Terms — convolutional LDPC codes, error floor, Gallager B, 
LDPC codes, min-sum decoding algorithm, Tanner code, trapping sets. 

I. Introduction 

The error-floor problem is arguably the most important problem 
in the theory of low-density parity check (LDPC) codes and iterative 
decoding algorithms. Roughly, error floor is an abrupt change in the 
frame error rate (FER) performance of an iterative decoder in the 
high signal-to-noise ratio (SNR) region (see [9] for more details and 
[1], [2], [3] for general theory of LDPC codes). 

The error floor problem for iterative decoding on the binary erasure 
channel (BEC) is now well understood, see [7], [8] and the references 
therein. 

In the case of the additive white Gaussian noise (AWGN) channel, 
MacKay and Postol in [4] pointed out a weakness in the construction 
of the Margulis code [22] which led to high error floors. Richardson 
[9] presented a method to estimate error floors of LDPC codes and 
presented results on the AWGN channel. He pointed out that the 
decoder performance is governed by a small number of likely error 
events related to certain topological structures in the Tanner graph 
of the code, called trapping sets (or stopping sets on BEC [7])Q 
The approach from [9] was further refined by Stepanov et al. in 
[10]. Zhang et al. [11] presented similar results based on hardware 
decoder implementation. Vontobel and Koetter [12] established a 
theoretical framework for finite length analysis of message passing 
iterative decoding based on graph covers. This approach was used 
by Smarandache et al in [13] to analyze the performance of LDPC 
codes from projective geometries [13] and for LDPC convolutional 
codes [14]. 

An early account on the most likely error events on the binary 
symmetric channel (BSC) for codes which Tanner graphs have cycles 
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is given by Forney et al in [16]. Some results on LDPC codes over 
the BSC appear in [13], as well. 

A significant part of the research on error floor analysis has 
also focused on methods for lowering the error floor. The two 
distinct approaches taken to tackle this problem are (1) modifying 
the decoding algorithm and (2) constructing codes avoiding certain 
topological structures. Numerous modifications of the sum-product 
decoding algorithm were proposed, see, for example, [18] and [19], 
among others. 

Among the methods from the second group, there have been novel 
constructions of codes with high Tanner graph girth [21], [6], as 
it was observed that codes with low girth tend to have high error 
floors. While it is true that known trapping sets have short cycles 
[10], [17], the example of projective geometry codes, that have short 
cycles, but perform well under (hard decision) iterative decoding, 
suggests that maximizing the girth is not the optimal procedure. As 
the understanding of the error floor phenomena and its connection 
with trapping sets grows, avoiding the trapping sets directly (rather 
than short cycles) seems to be a more efficient way (in terms of code 
rate and decoding complexity), to suppress error floors. 

Code modification for improving the performance on the binary 
erasure channel (BEC) was studied by Wang in [20]. To the best 
of our knowledge, it is the first paper on code modification with 
maximizing the size of stopping (or trapping) sets as the objective. 
Edge swapping within the code was suggested as a way to break 
the stopping sets. The method that we propose is similar. Roughly 
speaking, it consists of taking two (or more) copies of the same code 
and swapping edges between the code copies in such a way that the 
most dominant trapping sets are broken. It is also similar to the code 
constructions that appear in Smarandache et al [14], Thorpe [24], 
Divsalar and Jones [25] and Kelley, Sridhara and Rosenthal [26]. 

The advantages of the method are: (a) it is universal as it can 
be applied to any code/channel model/decoding algorithm and (b) 
it improves performance at the expense of increasing the code 
length only, without losing the code regularity, without changing the 
decoding algorithm, and, under certain conditions, without lowering 
the code rate. If the length of the code is fixed to n, the method can 
be applied by taking t copies of a (good) code C of length n/t and 
eliminating the most dominant trapping sets of C. The method can 
be slightly modified to construct convolutional LDPC codes as well. 
The details are given in Section [till 

We apply our method and construct codes based on Margulis [22], 
Tanner [21] and MacKay [23] codes and present results on the BSC 
when decoded using the Gallager B algorithm [1]. It is worth noting 
that the error floor on the AWGN channel depends not only on the 
structure of the code but also on implementation nuances of the 
decoding algorithm, such as numerical precision of messages [9]. 
Since the Gallager B algorithm operates by passing binary messages 
along the edges of a graph, any concern about the numerical precision 
of messages does not arise. 

The rest of the paper is organized as follows. In Section HH 
we introduce the notion of trapping sets and their relation to the 
performance of the code. We explain the proposed method in Section 
IllTl We present numerical results in Section [TV] and conclude in 
Section [V] 

II. Basic Concepts 

The Tanner graph of an LDPC code, Q, is a bipartite graph with 
two sets of nodes: variable (bit) nodes and check (constraint) nodes. 
The nodes connected to a certain node are referred to as its neighbors. 
The degree of a node is the number of its neighbors. The girth g is the 
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length of the shortest cycle in Q. In this paper, • represents a variable 
node, □ represents an even degree check node and ■ represents an 
odd degree check node. 

The notion of trapping sets was first introduced in [4], but here we 
follow the formalism from [19]. 

Definition 1: For a given m x n matrix U = (Uij) with 1 < i ^ 
m, 1 < j ^ n, the projection of a set of h columns indexed by 
jij j2, • • • ,jh is an m x h matrix consisting of the elements Uij, 
1 < i < m, j = ji,j 2 , • • • Jh- 

Definition 2: Let if be a parity check matrix of an LDPC code. 
An (a, b) trapping set X is a set of a columns of H with a projection 
that contains b > odd weight rows. 

The definition of the trapping set above is purely topological, 
that is, a trapping set can be seen as a subgraph of the Tanner 
graph. In other words, an (a, b) trapping set T is a subgraph with 
a variable nodes and b odd degree checks. The most probable noise 
realizations that lead to decoding failure are related to trapping sets 
([9], [10]). A measure of noise realization probability is referred to as 
pseudo-weight. Following the terminology in [10], an instanton can 
be defined as the most likely noise realization that leads to decoding 
failure. 

The instantons on the BSC consist of the received bit configurations 
with minimal number of erroneous bits that lead to decoding failure. 
Following [17], the notion specific to BSC, analogous to pseudo- 
weight, can be defined as: 

Definition 3: The minimal number of variable nodes that have to 
be initially in error for the decoder to end up in the trapping set X 
will be referred to as the critical number k for that trapping set. 

Remark: To "end up" in a trapping set X means that, after a finite 
number of iterations, the decoder will be in error, on at least one 
variable node from X, at every iteration. Note that the variable nodes 
that are initially in error do not have to be within the trapping set. 

We illustrate the above concepts with an example. 




(a) (5,3) trapping set (b) (4,4) trapping set 



Fig. 1. Trapping sets 

Example 1: The (5, 3) trapping set in Fig. |l(a)| appears (among 
other codes) in the Tanner (155, 64) code [17] (see also the examples 
of irreducible closed walks in the chapter 6.1 of [5]) . This trapping 
set has critical number k = 3 under the Gallager B decoding 
algorithm (for the definition of the algorithm see [2]), meaning that, 
if three variable nodes, on the diagonal from bottom left to top right, 
are initially in error, the decoder will fail to correct the errors. 

Fig. 1 1(b) [ illustrates a (4, 4) trapping set. This trapping set, although 
smaller, has critical number k = 4, (all the variable nodes have to be 
in error initially for the decoder to fail). So, if a code has both (5, 3) 
and (4, 4) trapping sets, the FER performance is dominated by the 
(5,3) trapping set. 

At the end of this example, we note that the (5, 3) trapping set 
above is an example of an oscillatory trapping set, i.e, if three variable 
nodes on the diagonal are initially in error, after the first iteration 
those three nodes will be decoded correctly, but the remaining two 
will be in error. In the decoding attempt after the second iteration 



those two will be correct, but the initial three will be in error again, 
and so on. 

Remark: Note that on the BEC the critical number is just the size of 
the stopping set, see [20]. 

We now clarify what "the most dominant trapping sets" means and 
how these effect code performance. 

Let a be the transition probability of the BSC and Ck be the number 
of configurations of received bits for which k channel errors lead to 
a codeword (frame) error. The frame error rate (FER) is given by: 

n 

FER{a) = ^c fc a*(l - a) (n ~ k) 

k=i 

where i is the minimal number of channel errors that can lead to a 
decoding error (size of instantons) and n is the length of the code. 

On a semilog scale the FER is given by the expression 

n 

\og(FER(a)) = log ( ^> fc a fe (l - a) n ~ k ) (1) 

k=i 

= log(ci) + i log(a) + log((l - a) n - 1 ) (2) 

+ log (l + ^±ia(l - a)' 1 + . . . + ^a n ~\l - aY' 71 ) (3) 
\ Ci a J 

In the limit a — > we note that 

lim [log((l-oO™)l =0 

and 

lim [log ( 1 + 2±±a(l - a)' 1 . . . + — a n ~\l - a) % - n \}= 
«— >o i v a Ci / J 

So, the behavior of the FER curve for small a is dominated by 

\og(FER(a)) « log(ci) + zlog(a) 

The log(FER) vs log (a) graph is close to a straight line with 
slope equal to i -the minimal critical number or cardinality of the 
instantons. 

Therefore, if two codes C\ and Ci have instanton sizes %\ and 
12, such that i\ < 12, then the code C2 will perform better than Ci 
for small enough a, independent of the number of instantons, just 
because log (a) — > —00 as a — > 0. Note also that the critical number 
of the most dominant trapping sets cannot be greater than half the 
minimum distance. If it is the case, the performance of the decoder 
is dominated by the minimum weight codewords. 

III. The Method for Eliminating Trapping Sets 

In this section we present a method to construct an LDPC code 
of length 2n from a given code C of length n and discuss a 
modification of the method that gives a convolutional LDPC code 
based on C. 

Let H and represent the parity check matrices of C and 
respectively. is initialized to 



Stated simply, is formed by taking two copies of H say C\ and 
Ci. It can be seen that if H has dimensions m x n, then has 
dimensions 2m x 2n. Every edge e in the Tanner graph Q of C is 
associated with a nonzero entry H t ,k- The operation of changing the 
value of H?l and H ( ^ n+k to "0", and « t>fc and H% n+k to 
"1" is termed as swapping the edge e. Fig.[2]illustrates edge swapping 
in two copies of a (5, 3) trapping set. We assume that the most 
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Fig. 2. Trapping set elimination 



dominant trapping sets for C are known. The method can be described 
in the following steps. 

Algorithm: 

1) Take two copies C\ and C2 of the same code. Since the 
codes are identical they have the same trapping sets. Initialize 
SwappedEdges=(j); FrozenEdges=(f); 

2) Order the trapping sets by their critical numbers. 

3) Choose a trapping set T x in the Tanner graph of Ci, with 
minimal critical number. Let E% x denote the set of all edges 
in % x . If (E% x n SwappedEdges 7^ 0) goto [5] Else goto HI 

4) Swap an arbitrarily chosen edge e G E% x \ FrozenEdges (if it 
exists). Set SwappedEdges — SwappedEdges U e. 

5) "Freeze" the edges E% x from % ± so that they cannot 
be swapped in the following steps. Set FrozenEdges — 
FrozenEdges U E% x . 

6) Repeat steps 2 to 4 until it is possible to remove the trapping 
sets of the desired size. 

Step|5]is needed because swapping additional edges from the (former) 
trapping sets might introduce trapping sets with a same critical 
number again. Fig. [3] illustrates such a swapping which corresponds 
to just interchanging the check nodes. 




Fig. 3. Reintroducing trapping set by swapping two edges 

The Tanner graph of the newly made code is a special double cover 
of the original code's Tanner graph, interested readers are referred to 
[12]. 

Remark: There are several approaches which may improve the 
efficiency of the algorithm. Firstly, instead of swapping the edges 
at random at step 3, edges could be swapped based on the number of 
trapping sets they participate in, or by using some other schedule 
that would (potentially) lead to the highest number of trapping 
sets eliminated. The structure of the code can also be exploited. 
For example, the Margulis (2640, 1320) code [22], has 1320 (4,4) 
minimal trapping sets with the property that each trapping set has one 
edge that does not participate in any other minimal trapping set. So, 
instead of swapping edges at random, the edges appearing in only 
one trapping set can be swapped, and such a procedure is guaranteed 
to eliminate all the minimal trapping sets. Also, there is a possibility 
not to freeze all the edges from the (former) trapping sets, but only 
those that would, if swapped, introduce the trapping sets with the 
same critical number. 

Note, however, that any edge swapping schedule can be seen as a 
particular realization of the random edge swapping. For all the codes 
that we considered, all trapping sets with minimal critical number 
were eliminated by the algorithm with random edge swapping. 



The following theorem shows how this method affects the code 
rate. 

Theorem 1: If the code C, with parity check matrix H, and rate 
r (and length n) is used in the algorithm above, the resulting code 
will have rate (and length 2n), such that ^ r. 

Proof: Each edge swapping operation in the algorithm can be seen 
as matrix modification. At the end of the algorithm, code is 
determined by 

" H' B 
B H' 



H (2) 
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' H 


H 




' H 
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H' 
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H' 
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H 



where H' and B are matrices such that H' + B — H, and H^ k (or 
B t ,k) can be equal to "1" only if H t ,k = 1. 

If the second block row is added to the first in H^ 2 \ and then the 
the first block column is added to the second, we end up with 



(4) 



The last matrix in © has rank which is greater than or equal to twice 
the rank of H. Therefore, the code has rate ^ r where r 
is the rate of CD 

Note, that = r if B = CH + HD, for some matrices C and 
D, so that CH corresponds to linear combinations of rows of H and 
HD corresponds to linear combinations of columns of H. We also 
have a following corollary. 

Corollary 1: If the matrix H has full rank, then = r. 

Proof: This follows from the fact that if H has full rank, then the 
last matrix in © has full rank also. □ 

At the end of this section, we briefly discuss the minimal distance 
of the modified code. 

Theorem 2: If the code C has minimal distance d m in , the modified 
code C^ 2 \ will have the minimal distance c^- n , such that, 2d m in > 

a min — a min- 

(2) 

Proof: We first prove that d m J in > dmin- Suppose that the minimal 
weight codeword of C^ is c^ 2 \ where is a column vector 
consisting of two vectors a and C2 of length n. Then H^c^ = 
is equivalent to 





" H' 
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Cl 
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H' 






Note 


that a 


+ C 2 


= c is a 



H'a + Bc 2 
Ba + H'c 2 







(5) 



Hamming weight Wh(c) < Wh (c^) 5 where Wh ( c< ^) I s me 
Hamming weight of the . Now: 



Hc= {H' + B) (ci +c 2 ) = H'ci + Bc 1 +H'c2+Bc 2 = 



(6) 



because the last expression in Eq. J6]) is equal to the sum of entries 
of the last column vector in Eq. \5\ So, c is a codeword of C 

If c / 0, from w h (c) < w h (c (2) ^) it follows that d^ in > dmin- 
If c = then c\ = c 2 , and from Eq. $5$ follows that Ha = 0, so 



j(2) 



> dn 



a is a codeword of C and again dr n 

y(2) 
vain 

a minimal weight codeword of C, we have: 



" H' 


B 




a 


B 


H' 




a 



(V) 



SO 2dmin > d\ 



(2) 



We finish this proof by mentioning that it is not difficult to 

(2) (2) 

construct examples where 2d m in — d m J in or d m J in — dmin, so the 
statement of the theorem is "sharp". □ 
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We described the algorithm in its basic form. can be initial- 
ized by interleaving the copies C\ and C2 in an arbitrary order, but 
we choose concatenation to keep the notation simple. The method, 
as well as all the proofs, will hold for any interleaving. It is also 
possible to consider more than two copies of the code to further 
eliminate trapping sets with higher critical number. 

The splitting of parity check matrix H into H' and B can be seen 
as a way to construct convolutional LDPC codes, that is, as a way 
to unwrap the original LDPC code C. For details on unwrapping see 
[15] and the references therein. The (infinite) parity check matrix can 
be can be constructed as: 

r H' 
B H' 

B H' 

B '•. 



(8) 



Note that by construction the resulting convolutional code has 
pseudo-codewords with higher pseudo-weights than original LDPC 
code. In this light, Theorem 2 can be seen as a generalization of 
Lemma 2.4 from [14]. We refer readers interested in convolutional 
LDPC codes to that paper. 

IV. Numerical Results 

In this section we illustrate the proposed method by modifying 
the Margulis [22], Tanner [21] and Mac Kay [23] codes to eliminate 
trapping sets under the Gallager B decoding algorithm. We use the 
trapping sets reported in [17]. 

Example 2: (Margulis (2640, 1320) code) The parity check of this 
matrix has full rank, so the modified code is an (5280, 2640) code, 
and has the same rate as the original code, i.e., = r = 0.5. 

This code has 1320 (4, 4) trapping sets with critical number 4 
as the most dominant ones. The modified (5280, 2640) code has 
no (4, 4) trapping sets and the performance is governed by (5, 5) 
trapping sets (ten cycles), that have critical number k = 5, Fig. |4] 



Margulis code 
Modified Margulis code 



slope a: 4 



slope « 5 



Fig. 4. 



Transition probability (a) 

Margulis code performance 



Example 3: (Tanner (155, 64) code) This code has (5, 3) trapping 
sets (Fig. |l(a)| ) with critical number i — 3 as the most dominant 
ones. There are 155 such trapping sets [17], [21]. In this case we 
used a version of the method in which it is possible to swap edges 
from the (former) trapping sets, if no trapping set of the same or 
smaller critical number is introduced. The result was a (310, 126) 
code for which the minimal trapping sets are type (4,4) (eight cycles) 
with critical number k — 4 (see Fig |l(b)| ). This was confirmed by 
numerical simulations in Fig. \5\ The FER curve changes the slope, 
for higher a, where FER contribution from the expression ® is not 
negligible. Note that there was a small rate penalty to this procedure. 
The original Tanner code has rate 0.4129, whereas the modified code 
has rate 0.4065. 



- * -Tanner code 

- * - Modified Tanner code 



/ 



Transition probability iai 

Fig. 5. Tanner code performance for a longer range of a 



Example 4: (MacKay' s (1008, 504) codes) This is an example of 
how the method can be used to produce better codes of a fixed length. 
We have taken a 504 length MacKay code and constructed a 1008 
(2 * 504) length code. The new code performs better than MacKay 
codes of length 1008. 

Both original 504 and 1008 length codes have two types of trapping 
sets with critical number k = 3, (5,3) and (3,3) (six cycles). We ran 
the algorithm so that all (3,3) trapping sets are eliminated from the 
newly constructed, but none of the (5,3) trapping sets. The results are 
shown in Fig. [6] It can be seen that, although the FER performance 




Transition probability (a) 



Fig. 6. MacKay 's codes performance 

is improved, the slope of the FER curve is approximately the sameQ 
Example 5: (AWGN channel) This example illustrates two points. 
First is that optimizing code for one decoding algorithm can lead to 
performance improvement for other decoding algorithms. The second 
point is that the use of an appropriate axis scaling can greatly help 
in error floor analysis and code performance prediction. 

We present FER results over AWGN channel and min-sum algo- 
rithm after 500 iterations for three codes, the original Tanner (155, 
64) code, our modified Tanner (310, 126) from the Example 3 and a 
random (310, 127) code with column weight 3 and row weight 5. 

In the low SNR region, where all kinds of error events are 
likely, the length (and rate) of a code govern the performance. 
In this region codes of length 310 have similar performance. For 
high SNRs, however, code optimization in terms of trapping sets 
becomes important and random code performance becomes much 
worse than performance of the modified Tanner (310, 126) code. 
Notice a pronounced error floor for the random code. 

What is even more illustrative is Fig. |7(b)| where we plot 
log(FER) versus SNR (not in dB) on the x-axis. This is because 
for high SNRs on the AWGN channel, similarly to Eq. CD, FER oc 
exp(— coin * SNR/2), where din is pseudo- weight of the most likely 

2 It is possible that a more sophisticated algorithm would also eliminate 
the (5,3) trapping sets. However, our goal with this example was to show the 
performance when some, but not all, of the trapping sets with minimal critical 
number are eliminated. 
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error event. So on the graph with SNR on the x-axis which is not in 
dB, log(FER) curve will approach (from above) a straight line with 
slope equal to — cj; n /2 as SNR — > oo. See [5] and [12] for further 
details. Using these observations and numerical results obtained by 
simulations we can estimate that our modified code has the slope 
approximately equal to 20, better than the original Tanner (155, 64) 
code with the slope of « 140 

Further more, considering that the slope for the random code is 
« 12, we can claim that, for SNR values higher than those on the 
plots, the Tanner code will perform better than the random code. 



\ 



■"-Tanner (155, 64) code 

Random (310, 127) code 
■♦-Modified Tanner (310, 126) code 



\ 



0.5 1 1.5 



SNR (dB) 

(a) log(FER) versus SNR in dB 



-Modified Tanner (310, 126) code 
-Random (310, 127) code 
-Tanner (155, 64) code 



SNR 

(b) log(FER) versus SNR as § (not in dB) 



Fig. 7. FER performance under min-sum decoding 



V. Conclusion 

The proposed method allows the construction of codes with 
good FER performance, but low row/column weight (as opposed 
to projective geometry codes) and therefore relatively low decoding 
complexity. Although numerical results for the Gallager B decoder 
are presented, we reiterate that the method can be used for code 
optimization based on the trapping sets of an arbitrary decoder. 

The algorithm can also be used to determine the pseudo-weight 
spectrum of a code as follows. Once the most likely trapping sets 
(those with the smallest pseudo-weight) are determined and elimi- 
nated by the method, the numerically obtained decoding performance 
of a modified code, i.e., the slope of the FER curve with appropriate 
axis, gives an estimate of the pseudo-weight of the next most likely 
trapping sets -just as it was done in the Example 5 with the Tanner 
code and the modified Tanner code. 
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